Inter-document Similarity in Web Searches

نویسندگان

  • Bruno Martins
  • Bruno Emanuel Da
  • Graça Martins
  • Mário Gaspar Da Silva
  • Luís Cabral De Moura Borges
  • André Osório
چکیده

Existing Web search services fail in helping users with information needs that are broad, vague, or hard to express through a set of keywords. This dissertation investigates the use of retrieval techniques based on inter-document similarity, either measured through the textual contents or the linkage between documents. Unlike traditional retrieval approaches, which match documents against keywords and produce one-dimensional ranked lists of results, techniques based on inter-document similarity offer better support for results visualization, as well as alternative ways of expressing information needs. A Portuguese Web search engine has been extended with two inter-document similarity algorithms: result set clustering and related pages. The system was evaluated in a user survey, which has shown that both algorithms are well accepted. KEY-WORDS: Web Information Retrieval, Clustering, Similarity Search, WebMining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

نقش ارتباطات معنایی در بهبود نتایج یک سیستم پیشنهاد استناد- مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران

With the increasingly growth of scientific documents in the Web, it is difficult to select a concerned document. A citation recommendation system receives a text and recommends documents to be cited by the text. Such recommendation helps a researcher in hitting his/her concerned texts. Based on sematic relations, this paper presents a new indicator to measure the similarity between documents an...

متن کامل

Clustering multilingual documents by estimating text - to - text semantic relatedness

This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...

متن کامل

Effective Hybrid Recommendation Combining Users-Searches Correlations Using Tensors

Most recommendation methods employ item-item similarity measures or use ratings data to generate recommendations. These methods use traditional two dimensional models to find inter relationships between alike users and products. This paper proposes a novel recommendation method using the multi-dimensional model, tensor, to group similar users based on common search behaviour, and then finding a...

متن کامل

Web Document Classification based on Hyperlinks and Document Semantics

Besides the basic content, a web document also contains a set of hyperlinks pointing to other related documents. Hyperlinks in a document provide much information about its relation with other web documents. By analyzing hyperlinks in documents, inter-relationship among documents can be identi ed. In this paper, we will propose an algorithm to classify web documents into subsets based on hyperl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004